首页> 外文OA文献 >A Search Engine for Historical Manuscript Images
【2h】

A Search Engine for Historical Manuscript Images

机译:历史手稿图像的搜索引擎

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. To the best of our knowledge, this is the first automatic retrieval system for historical manuscripts using text queries, without manual transcription of the original corpus.
机译:许多博物馆和图书馆档案馆都在数字化其大量手写历史手稿的收藏,以使公众可以访问它们。这些收藏仅以图像格式提供,并且需要昂贵的手动注释工作才能访问它们。当前的手写识别器的单词错误率超过50%,因此不能用于这种材料。我们描述了两种统计模型,用于在给定文本查询的大量手写手稿中进行检索。两者都使用一组转录的页面图像来学习根据单词图像计算的特征及其转录之间的联合概率分布。然后,在进行文本查询后,可以使用这些模型来检索手写文档的未标记图像。我们展示了一个实验,该训练集包含100个转录的页面,而测试集则包含987个来自华盛顿华盛顿手写的手写图像。实验表明,取决于模型,在20个文档上的精度约为0.4到0.5。据我们所知,这是第一个使用文本查询的历史手稿自动检索系统,而无需人工转录原始语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号